EYE MOVEMENTS CHARACTERIZING FOR THE ASSESSMENT OF EXPERTISE IN SOURCE CODE READING
INTRODUCTION
Expertise is an important criterion for programmers that contributes mainly on how they perform the programming related tasks. However, years of programming experience is not the only measure that is related to expertise (Feigenspan et al., 2012). Researchers have shown that expertise reflects the performance on how programmers perform and not necessarily how long they have trained (Shanteau, 1992). This new understanding of expertise would open the door to adopting new strategies from other domains of deliberate practice (a well-designed purposeful, and guided practice) to develop expert performance in the software engineering field (Ericsson & Lehmann, 1996; Anders Ericsson, 2008). The authors claim that a fully focused practice designed by professionals that meets the following conditions would significantly affect the development of an individual expertise. Those conditions include defining the goal of the given training activity, getting appropriate and immediate feedback, and dividing the entire task into smaller sections of sub-tasks to practice instead of repeating the full task as a whole. Therefore, some approaches that follow these principles of deliberate practice could also be applied to improve programming skills with more focus and attention.
At the same time, different programming expertise levels require a different mental workload to solve a programming task (Kuric & Bieliková, 2014). Program comprehension, for example, is a cognitive process that allows developers to use their
knowledge with such a mental model to acquire information from the code and draw their conclusions (Feigenspan et al., 2012). However, a developer’s programming experience is a sensitive key to ease this involved internal cognitive process.
Over the years, researchers have made many attempts to measure programming experience and try to evaluate programmers activities with respect to their expertise (Arisholm et al., 2007; Feigenspan et al., 2012), as well as to understand developers’ behaviors during programming activities corresponding to their expertise level, such as in source code comprehension (Soloway & Ehrlich, 1984; Bednarik & Tukiainen, 2006; Feigenspan et al., 2011), maintenance (Yu et al., 2019), and debugging (Vessey, 1985; Alqadi & Maletic, 2017). There are many ways for managing expertise in conducted studies, including filling questionnaires in prior experiments (Feigenspan et al., 2011), years of programming, education level (Ricca et al., 2007; Kevic et al., 2015; Busjahn et al., 2015; Abid, Maletic, et al., 2019), and how programmers evaluate their experience (Feigenspan et al., 2012). Further research studies have been conducted by using eye- tracking technologies to study the effect of expertise levels on comprehension during code reading (Busjahn et al., 2011), reviewing (Uwano et al., 2006), and summarization (Rodeghero et al., 2014). Another eye-tracking study attempts to evaluate the differences between the expertise and professional status of software developers (Soh et al., 2012).
Due to the existence of multiple programming languages (Shrestha et al., 2020), each of which is best suited for specific applications. In addition, the need of being up to date with new technologies, and the availability and expansion in learning resources nowadays is even more important as becoming an expert is no longer limited to how long
you are at the job. Thus, new ongoing studies and methods are necessary to better measure these new levels of expertise. There is a need to clarify and redefine expertise rather than using professional status, education level, or programming years. Moreover, with fast improvements in the software engineering field, there should be a consistent improvement in developer expertise assessment. One example of this is the adoption of more realistic methods for developer expertise evaluation in a realistic environment while conducting the programming activities (i.e., eye movement tracking on code reading).
The general objective of this research is to provide a reliable approach to characterize developers’ expertise level as expert/novice via findings from their eye movement data. One way is to examine the differences between experts and novices at a finer level of granularity, namely the source code element level. For this purpose, we use multiple source code parts to evaluate the impact of expertise on navigating the code area. Through this approach, we provide an in-depth analysis of the reading process that includes statement and term levels to capture more detailed information. Our work also focuses on the impact of expertise on viewing the source code elements regardless of the time duration. This study analyzes an existing eye tracking data set to study programmers’ behaviors and strategies during source code reading in the context of a bug fix. The data was collected in 2015 (Kevic et al., 2015). Kevic et al. conduct an eye tracking study on large source code in an open-source system using the Eclipse plugin iTrace (Shaffer et al., 2015; Guarnera et al., 2018). Twenty-two programmers at two different expertise levels (experts and novices) participated in this study. The participants consisted of twelve professional programmers working in industry and ten computer science students.
In another evaluation study using a different dataset, we aim to make use of the eye movement related metrics to find a distinct pattern that can differentiate between subjects in a program comprehension task related to their expertise. To achieve this goal, we adapt multiple metrics to describe the differences between experts’ and novices’ visual behaviors when solving comprehension tasks. In the same study, we perform an empirical investigation to study the extent to which type of expertise metrics correlate the best with eye-tracking parameters. To validate the results, we choose the metrics from three main categories: fixation-related metrics, saccade-related metrics, and pupil dilation metrics. For this analysis, we utilize the EMIP distributed eye-tracking dataset (see section 5.1); this open dataset includes the gaze data of 216 developers, each of whom solved two comprehension programs (Bednarik et al., 2020).
In terms of analyzing pupillometry data before comparing subjects, one way is to predefine pupil peak values. One can then find and average the changes in the pupil sizes relative to the baseline for each developer up to each threshold. According to the Beatty reviews in 1982 (Beatty, 1982), Hakerem and Sutton provided one of the first attempts of pupillometric analyses at the visual threshold (Hakerem & Sutton, 1966). We adopted this approach which has been presented and used previously in multiple studies to analyze pupil dilation, such as in (Klingner, 2010; Fritz et al., 2014; Eckstein et al., 2017).
Pupil size is influenced by cognitive load and usually tends to dilate up to 0.5 mm above its relative baseline value (Beatty, 1982; Beatty & Lucero-Wagoner, 2000; Sirois & Brisson, 2014). Many early studies have been conducted to prove the reflection of cognitive workload effect on pupil size (Hess & Polt, 1964; Hoeks & Levelt, 1993; Marshall, 2002).
Thus, pupil dilation has been used in many studies as a measure to study mental workload (Bailey & Iqbal, 2008; Klingner et al., 2011), to explore the relationship between cognitive ability and the pupil baseline (Tsukahara et al., 2016), and to combine pupil dilation with other physiological measurements to assess cognitive load (Hogervorst et al., 2014).
This study uses the changes in the pupillary response of expert/novice developers as an indicator for the underlying cognitive efforts they performed. To support a valid dilation analysis and to perform a fair comparison between developers, this study includes developers who solve tasks in the same language (Java). On average, the results show that less experienced developers have a statistically higher average of fixations with dilated pupils than skilled developers. This result suggests that novices apply more attentional focus and mental efforts to solve the comprehension task than expert developers.